AITopics | descent direction

Objective Soups: Multilingual Multi-Task Modeling for Speech Processing

Neural Information Processing SystemsJun-10-2026, 18:36:55 GMT

The need for training multilingual multi-task speech processing (MSP) models that perform both automatic speech recognition and speech-to-text translation is increasingly evident. However, a significant challenge arises from the conflicts among multiple objectives when using a single model. Multi-objective optimization can address this challenge by facilitating the optimization of multiple conflicting objectives and aligning the gradient updates in a common descent direction. While multi-objective optimization helps avoid conflicting gradient updates, a critical issue is that when there are many objectives, such as in MSP, it is often {\em difficult to find} a common descent direction. This leads to an important question: Is it more effective to separate highly conflicting objectives into different optimization levels or to keep them in a single level? To address this question, this paper investigates three multi-objective MSP formulations, which we refer to as \textbf{objective soup recipes}. These formulations apply multi-objective optimization at different optimization levels to mitigate potential conflicts among all objectives. To keep computation and memory overhead low, we incorporate a lightweight layer selection strategy that detects the most conflicting layers and uses only their gradients when computing the conflict avoidance direction. We conduct an extensive investigation using the CoVoST v2 dataset for combined multilingual ASR and ST tasks, along with the LibriSpeech and AISHELL-1 datasets for multilingual ASR, to identify highly conflicting objectives and determine the most effective training recipe among the three proposed multi-objective optimization algorithms.

artificial intelligence, optimization problem, proceedings, (11 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.96)

Add feedback

A Stein variational Newton method

Gianluca Detommaso, Tiangang Cui, Youssef Marzouk, Alessio Spantini, Robert Scheichl

Neural Information Processing SystemsFeb-15-2026, 09:43:13 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, approximation, kernel, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts (0.04)
Oceania > Australia (0.04)
North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.41)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.32)

Add feedback

527d9d8f89aec80d634e366a97f49ba8-Paper-Conference.pdf

Neural Information Processing SystemsFeb-13-2026, 07:58:10 GMT

gradient descent, neural network, neuron, (14 more...)

Neural Information Processing Systems

Country: Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre: Research Report > Experimental Study (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback

Uncertainty Sampling is Preconditioned Stochastic Gradient Descent on Zero-One Loss

Stephen Mussmann, Percy S. Liang

Neural Information Processing SystemsFeb-12-2026, 21:57:40 GMT

Neural Information Processing Systems http://nips.cc/

dataset, learning, zero-one loss, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)

Add feedback

685bfde03eb646c27ed565881917c71c-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-12-2026, 11:07:06 GMT

descent direction, pareto mtl, revision, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.30)

Add feedback

A Proofs

Neural Information Processing SystemsFeb-11-2026, 06:27:03 GMT

Lemma 1. Assume that Assumptions 1 and 2 hold, the iterations satisfy the following inequality for all k 2 N: Combine Assumption 2 with Definition 4.6, we have the second moment of g(W Summing both sides of this inequality for k 2{1,...,K} and recalling Assumption 2 (a) gives Rearranging above inequality and dividing further by K yields the result. The second condition in Eq. 4.10 ensures that lim Summing both sides of this inequality for k 2{1,...,K} and recalling Assumption 2(a) gives It guarantees that the model moves towards the descending direction of the loss function. Following the experimental setup in Section 5.1, we demonstrate that the proposed method empirically satisfies Assumption 2(b), and visualize in Figure 7 the sufficient direction constant µ for the (partial) convolutional layers of the four models during the end-to-end training using TREC. For SqueezeNet and ResNet-34, we show one block as the representative, since the other blocks have similar performance. Several insights can be drawn from Figure 7. (i) The value of µ of each convolutional layer is consistently greater than zero, indicating that Assumption 2(b) is satisfied, further ensuring the convergence of the TREC-equipped CNNs.

artificial intelligence, assumption 2, machine learning, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.55)

Add feedback

82846e19e6d42ebfd4ace4361def29ae-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 08:17:13 GMT

gradient, neural network, time step, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.67)

Industry: Education > Educational Setting > Online (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

96f2d6069db8ad895c34e2285d25c0ed-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 10:45:07 GMT

Smooth convex optimization problems over polytopes are an important class of problems that appear in many settings, such as low-rank matrix completion [1],structured supervised learning [2,3],electrical flowsovergraphs [4],video co-localization in computer vision [5], traffic assignment problems [6], and submodular function minimization [7].

artificial intelligence, optimization problem, xi 1, (16 more...)

Neural Information Processing Systems

Country: